Variant Discovery    ◾    123

#6- Sorting and indexing BAM files

#Sorting BAM files

mkdir sortedbam

cd bam

for i in $(ls *.bam);

do

samtools sort -T ../sortedbam/tmp.sort -o ../sortedbam/${i}

${i}

done

cd ..

#Indexing the sorted BAM files

cd sortedbam

for i in $(ls *.bam);

do

samtools index ${i}

done

cd ..

#7- Marking/removing duplicate alignments

mkdir dupRemBam

cd sortedbam

for i in $(ls *.bam);

do

samtools rmdup ${i} ../dupRemBam/${i} 2> ../dupRemBam/${i}.log

done;

cd ..

#8- Alignment pileup and variant calling using bcftools

mkdir variants

cd dupRemBam

for i in $(ls *.bam);

do

samtools index ${i}

done;

ls *.bam | rev | rev > bam_list.txt

bcftools mpileup -Ou \

-f ../ref/GCF_009858895.2_ASM985889v3_genomic.fna \

-b bam_list.txt \

| bcftools call -vmO z \

-o ../variants/sarscov2.vcf.gz

cd ..

#9- Index the VCF file

tabix variants/sarscov2.vcf.gz

#to view contents

bcftools view variants/sarscov2.vcf.gz | less -S

The resulted VCF file, which contains the genotypes of all samples (Figure 4.4), can be

further analyzed as we will discuss at the end of this chapter. However, before analysis, the

identified variants must be filtered.